Usable speech recognition
نویسنده
چکیده
A growing number of lecture webcasts are archived after being delivered live. In the absence of transcripts, users are faced with increased difficulty in performing tasks easily achieved with text documents (retrieval, browsing, skimming). Unfortunately, speech recognition systems do not perform satisfactorily when transcribing lectures. In this paper, we present an overview of the ePresence lecture transcription project, whose goal is to improve the usefulness and usability of automaticallygenerated transcripts of webcast lectures. We achieve this by integrating novel speech recognition techniques specifically addressed at increasing the accuracy of webcast transcriptions with the development of an interactive collaborative interface that facilitates users' contribution to the improvement of machine-generated transcripts. We conclude by discussing the challenges (and possible solutions) to successfully integrate transcripts into archives of webcast lectures.
منابع مشابه
Usable speech measures and their fusion
Usable speech is a novel concept related to the co-channel speech problem. Co-channel speech occurs when more than one person is talking at the same time. The idea of usable speech is to identify and extract those portions of co-channel speech that are still useful for speech processing applications such as speaker identification or speech recognition, which do not work in cochannel environment...
متن کاملTowards Highly Usable and Robust Spoken Language Technologies for Chinese
This paper gives an overview of our research on Chinese spoken language technologies during the past ten years. It covers fundamental acoustic-phonetic studies of spoken Cantonese, speech corpora development, automatic speech recognition and text-to-speech. Currently our focus is on making these technologies more usable for general users who are not speech experts, and more robust for real-worl...
متن کاملDeveloping usable speech criteria for speaker identification technology
Recently, a “usable speech” extraction system [1] was proposed to separate co-channel speech into “usable” frames that are minimally corrupted by interfering speech. Studies indicate [2] that a significant amount of cochannel speech can be considered “usable” for speaker identification (SID). Therefore, it is necessary to establish criteria for usable speech frames for SID. Voiced speech, of wh...
متن کاملUsable Speech Assignment for Speaker Identification System
Usable speech criteria are proposed to extract minimally corrupted speech for speaker identification in cochannel speech. Extracted usable segments are separated in time and need to be organized into speaker streams for speaker identification system. In this paper, we focus to organize extracted usable speech segment into a single stream for the same speaker by speaker assignment system. We ext...
متن کاملEvaluation of a Multi-Resolution Dyadic Wavelet Transform Method for usable Speech Detection
Many applications of speech communication and speaker identification suffer from the problem of co-channel speech. This paper deals with a multi-resolution dyadic wavelet transform method for usable segments of co-channel speech detection that could be processed by a speaker identification system. Evaluation of this method is performed on TIMIT database referring to the Target to Interferer Rat...
متن کاملStructure-based Speech Classifcation Using Non-linear Embedding Techniques
Usable speech” is referred to as those portions of corrupted speech which can be used in determining a reasonable amount of distinguishing features of the speaker. It has previously been shown that the use of only voiced segments of speech improves the usable speech detection system, and also, that unvoiced speech does not contributes significant information about the speaker(s) for speaker ide...
متن کامل